Use meta device tensor to infer contiguity for expr-eval segments #5772

zasdfgbnm · 2026-01-07T22:18:50Z

Stacked on #5766

I used to work on #5082 for the fix, but I hit too many blockers, because this PR could interact with many new assumptions/hacks/unfinalized designs on things like allocation domain, stream-sharded tensor, multidevice, etc., and we keep having new things committed to the main branch that break #5082. This situation delayed the PR for a very long time. So I recreated this PR that is more friendly to incremental development.

Today, in the main branch, in FusionExecutorCache, we were assuming fusion segments always generate contiguous tensors. This is not true for ExpressionEvaluator segments. For example, ATen's slice op returns non-contiguous tensors. It is worth mentioning that, because segmentation and scheduler selection depend on inputs, the contiguity of intermediate results also depends on inputs.

This PR adds FusionKernelRuntime::inferOutputMetaTensor(, which replaces inferOutputShapeAndContiguousStrides to infer the output shape and stride of each segment. Both FusionKernelRuntime::inferOutputMetaTensor( and inferOutputShapeAndContiguousStrides store their result as a tensor on the meta device. The difference is, FusionKernelRuntime::inferOutputMetaTensor( will actually run the segment on device type meta if this segment is scheduled to run by ExpressionEvaluator, while inferOutputShapeAndContiguousStrides just assumes the output to be contiguous.

Because FusionKernelRuntime::inferOutputMetaTensor( will run the segment on device type meta, related op's MyOp::evaluate should work for device type meta. There is good and bad news for this design. The good news is, most MyOp::evaluate just calls at:: ops, which usually already support meta device, and PyTorch designed meta device to try to make its behavior on par with CUDA. The bad news is, because many op's meta device implementation is on Python, running at::op on these kinds of ops would hang due to the inability to grab Python's GIL (Thanks @naoyam for help debugging!). If this is the case, the corresponding MyOp::evaluate must manually compute the shape and stride and use at::empty_strided(device=meta) to create the result.

Besides FusionKernelRuntime::inferOutputMetaTensor(, this PR also adds FusionKernelRuntime::updateContiguityOfSegmentOutputs(. Which updates the segment output TensorViews' contiguity based on the inferred shape and stride.

This PR adds an enable option "infer-contiguity" to incrementally enable this feature. When "infer-contiguity" is disabled, FusionKernelRuntime::inferOutputMetaTensor( will fallback to the behavior of inferOutputShapeAndContiguousStrides, and FusionKernelRuntime::updateContiguityOfSegmentOutputs( will be no-op. The plan is, we merge this PR and not set "infer-contiguity" for the currently failed tests. I will write new PRs fixing the failed tests one by one.

github-actions · 2026-01-07T22:19:51Z

Review updated until commit c81f895

Description

Add new InferContiguity option to control contiguity inference behavior
Replace inferOutputShapeAndContiguousStrides with inferContiguousOutputMetaTensor for clarity
Introduce FusionKernelRuntime::inferOutputMetaTensor() to run expr-eval segments on meta device for accurate contiguity
Add updateContiguityOfSegmentOutputs() to update contiguity info based on runtime behavior
Fix matmul evaluation to conditionally use old contiguous assumption when option disabled
Update test configurations to set/unset InferContiguity option appropriately
Add regression test for issue FusionKernelRuntime::getMaybeHeuristicsFor computes the wrong strides. #4888 to verify contiguity inference works correctly

Changes walkthrough

Relevant files

Configuration changes

2 files

options.cpp `Add InferContiguity option to available options`	+1/-0
options.h `Define InferContiguity enable option enum`	+1/-0

Enhancement

6 files

allocations.cpp `Rename function to inferContiguousOutputMetaTensor`	+2/-2
allocations.h `Update function signature for renamed allocation function`	+1/-1
fusion_kernel_runtime.cpp `Add inferOutputMetaTensor and updateContiguityOfSegmentOutputs methods`	+68/-8
fusion_kernel_runtime.h `Declare new methods for meta tensor inference and contiguity updates`	+21/-0
conftest.py `Add enable/disable options support to exec_nvfuser`	+11/-1
utils.py `Add options support to check_captured_python_definition`	+18/-2

Bug fix

1 files

composite_nodes.cpp `Update matmul evaluation to use InferContiguity option`	+12/-8

Miscellaneous

1 files

fusion_cache_utils.cpp `Add missing include for ir_utils`	+1/-0

Tests

12 files

test_alias.cpp `Disable InferContiguity option for alias test`	+3/-0
test_indexing_advanced.cpp `Enable InferContiguity option for advanced indexing tests`	+2/-0
test_layout_op.cpp `Disable InferContiguity option for layout op test`	+1/-0
test_loop_domain_scheduling.cpp `Enable InferContiguity option for loop domain scheduling test`	+1/-0
test_low_precision_recipe.cpp `Disable InferContiguity option for block quantization test`	+7/-1
test_matmul_aten_evaluation.cpp `Remove matmul output strides test`	+0/-33
test_matmul_scheduler.cpp `Enable InferContiguity option for matmul scheduler tests`	+1/-0
test_pointwise.cpp `Enable InferContiguity option for pointwise tests`	+1/-0
test_rng.cpp `Enable InferContiguity option for RNG tests`	+1/-0
test_segmentation.cpp `Update expected upcast ops count in segmentation test`	+4/-1
utils.cpp `Enable InferContiguity option in NVFuserTest setup`	+1/-0
test_python_frontend.py `Add test_issue4888 for contiguity inference regression test`	+98/-0

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests

⚡ Recommended focus areas for review

Backward Compatibility

The new InferContiguity option changes the behavior of MatmulOp::evaluate(). When disabled, it uses the old logic that assumes contiguous outputs. When enabled, it uses the new logic that infers actual contiguity. This could potentially break existing code that depends on the old behavior. The PR should document this breaking change clearly and provide migration guidance.

// Without InferContiguity, we mistakenly assume the output is contiguous.
if (!isOptionEnabled(EnableOption::InferContiguity)) {
  const auto& [sizes, strides] = inferShapeAndContiguousStrides(out(), ee);
  auto meta_out = at::detail::empty_strided_meta(sizes, strides, a.dtype());

  if (meta_out.is_contiguous()) {
    return {matmul_out};
  }

  auto strided_matmul_out = at::empty_strided(sizes, strides, a.options());
  strided_matmul_out = strided_matmul_out.copy_(matmul_out);
  return {strided_matmul_out};
}
return {matmul_out};

Performance Impact

The new inferOutputMetaTensor() function actually runs segments on meta device for ExprEval, which could have performance implications. The PR should include performance benchmarks to show that the overhead is acceptable and doesn't significantly impact runtime performance.

KernelArgumentHolder FusionKernelRuntime::inferOutputMetaTensor(
    HeuristicParamsList* heuristics,
    SegmentedGroup* group_to_run,
    const KernelArgumentHolder& group_runtime_inputs,
    PrecomputedValues* evaluator_precomputed_values) const {
  FUSER_PERF_SCOPE("FusionKernelRuntime::inferOutputMetaTensor");
  NVF_ERROR(heuristics != nullptr);
  Fusion* fusion_to_run = group_to_run->getFusion();
  KernelArgumentHolder group_runtime_outputs;
  const auto& heuristic_params = heuristics->at(group_to_run->groupId());
  const bool is_expr_eval =
      heuristic_params->scheduler_type == SchedulerType::ExprEval;
  if (is_expr_eval && isOptionEnabled(EnableOption::InferContiguity)) {
    // For expr evaluated fusion, the striding rules follow that of ATen.
    ExpressionEvaluator eval_fusion;
    for (auto i : arange(group_to_run->inputs().size())) {
      const auto& tensor_pv = group_runtime_inputs[i];
      if (tensor_pv.is<at::Tensor>()) {
        const auto& t = tensor_pv.as<at::Tensor>();
        if (t.defined()) {
          const auto meta_t = at::empty_strided(
              t.sizes(),
              t.strides(),
              at::TensorOptions().device(at::kMeta).dtype(t.dtype()));
          eval_fusion.bind(fusion_to_run->inputs()[i], meta_t);
        } else {
          eval_fusion.bind(fusion_to_run->inputs()[i], t);
        }
      } else {
        eval_fusion.bind(fusion_to_run->inputs()[i], tensor_pv);
      }
    }
    for (auto v : fusion_to_run->outputs()) {
      auto result = eval_fusion.evaluate(v);
      group_runtime_outputs.push(result);
    }
  } else {
    return inferContiguousOutputMetaTensor(
        fusion_to_run, group_runtime_inputs, evaluator_precomputed_values);
  }
  return group_runtime_outputs;
}

Test Coverage

The new test test_issue4888 is quite complex and comprehensive, which is good. However, it would be beneficial to add simpler, more focused tests that specifically validate the contiguity inference behavior for edge cases like slice operations that are known to produce non-contiguous tensors.

def test_issue4888(nvfuser_direct_test):
    # https://github.com/NVIDIA/Fuser/issues/4888
    def nvfuser_fusion_id2(fd: FusionDefinition) -> None:
        T0 = fd.define_tensor(
            shape=[4096, 4097],
            contiguity=[True, True],
            dtype=DataType.BFloat16,
            is_cpu=False,
            stride_order=[1, 0],
        )
        T1 = fd.define_tensor(
            shape=[4096, 4097],
            contiguity=[True, True],
            dtype=DataType.Bool,
            is_cpu=False,
            stride_order=[1, 0],
        )
        T2 = fd.define_tensor(
            shape=[4096, 4097],
            contiguity=[True, True],
            dtype=DataType.Bool,
            is_cpu=False,
            stride_order=[1, 0],
        )
        T3 = fd.define_tensor(
            shape=[1, 32, 4096, 4096],
            contiguity=[None, True, True, True],
            dtype=DataType.BFloat16,
            is_cpu=False,
            stride_order=[3, 2, 1, 0],
        )
        T4 = fd.ops.cast(T0, dtype=DataType.Float)
        T5 = fd.ops.bitwise_or(T1, T2)
        T6 = fd.ops.set(T5)
        fd.add_output(T6, T1)
        T7 = fd.ops.cast(T6, dtype=DataType.Float)
        T8 = fd.ops.mul(T4, T7)
        T9 = fd.ops.cast(T8, dtype=DataType.BFloat16)
        T10 = fd.ops.set(T9)
        fd.add_output(T10, T0)
        T15 = fd.ops.broadcast_in_dim(T10, shape=[1, 4096, 4097], broadcast_dims=[1, 2])
        T21 = fd.ops.broadcast_in_dim(
            T15, shape=[1, 1, 4096, 4097], broadcast_dims=[0, 2, 3]
        )
        T27 = fd.ops.broadcast_in_dim(
            T21, shape=[1, 1, 4096, 4097], broadcast_dims=[0, 1, 2, 3]
        )
        T43 = fd.ops.slice(
            T27,
            start_indices=[0, 0, 0, 0],
            end_indices=[1, 1, 4096, 4096],
            strides=[1, 1, 1, 1],
            manual_normalization=0,
        )
        T49 = fd.ops.broadcast_in_dim(
            T43, shape=[1, 32, 4096, 4096], broadcast_dims=[0, 1, 2, 3]
        )
        T50 = fd.ops.cast(T49, dtype=DataType.Float)
        T51 = fd.ops.cast(T3, dtype=DataType.Float)
        S52 = fd.define_scalar(0.0883883, dtype=DataType.Double)
        T53 = fd.ops.mul(T51, S52)
        T54 = fd.ops.add(T53, T50)
        T55 = fd.ops.max(T54, dims=[3], keepdim=False, dtype=DataType.Null)
        T61 = fd.ops.broadcast_in_dim(
            T55, shape=[1, 32, 4096, 1], broadcast_dims=[0, 1, 2]
        )
        T67 = fd.ops.broadcast_in_dim(
            T61, shape=[1, 32, 4096, 4096], broadcast_dims=[0, 1, 2, 3]
        )
        T68 = fd.ops.sub(T54, T67)
        T69 = fd.ops.exp(T68)
        T70 = fd.ops.sum(T69, dims=[3], keepdim=False, dtype=DataType.Null)
        T76 = fd.ops.broadcast_in_dim(
            T70, shape=[1, 32, 4096, 1], broadcast_dims=[0, 1, 2]
        )
        T82 = fd.ops.broadcast_in_dim(
            T76, shape=[1, 32, 4096, 4096], broadcast_dims=[0, 1, 2, 3]
        )
        T83 = fd.ops.reciprocal(T82)
        T84 = fd.ops.mul(T69, T83)
        T85 = fd.ops.cast(T84, dtype=DataType.BFloat16)
        fd.add_output(T49)
        fd.add_output(T84)
        fd.add_output(T85)

    inputs = [
        torch.testing.make_tensor((4096, 4097), dtype=torch.bfloat16, device="cuda:0"),
        torch.testing.make_tensor((4096, 4097), dtype=torch.bool, device="cuda:0"),
        torch.testing.make_tensor((4096, 4097), dtype=torch.bool, device="cuda:0"),
        torch.testing.make_tensor(
            (1, 32, 4096, 4096), dtype=torch.bfloat16, device="cuda:0"
        ),
    ]
    nvfuser_direct_test.exec_nvfuser(
        nvfuser_fusion_id2, inputs, enable_options=["infer_contiguity"]
    )

…andling - Renamed `inferOutputShapeAndContiguousStrides` to `inferContiguousOutputMetaTensor` for clarity. - Updated function signatures to remove unnecessary parameters. - Introduced `inferOutputMetaTensor` in `FusionKernelRuntime` to handle output shape inference for segmented groups. - Enhanced `updateWithSegmentOutputs` to streamline output management without updating contiguity directly. - Improved overall code organization and readability.

zasdfgbnm · 2026-01-14T20:00:31Z

!test

zasdfgbnm · 2026-01-14T20:12:10Z

!test

zasdfgbnm · 2026-01-14T20:41:07Z

!test

zasdfgbnm · 2026-01-14T21:46:03Z

!test

zasdfgbnm · 2026-01-14T22:19:00Z

!test

zasdfgbnm · 2026-01-15T07:06:03Z

!test

zasdfgbnm · 2026-01-15T07:13:15Z

!test

greptile-apps · 2026-01-15T18:52:42Z

Greptile Summary

This PR fixes issue #4888 where FusionKernelRuntime incorrectly computed strides for intermediate tensors from ExpressionEvaluator segments by assuming all outputs are contiguous. The key changes are:

Added FusionKernelRuntime::inferOutputMetaTensor() which executes ExprEval segments on meta device to infer actual output shapes and strides, replacing the previous inferOutputShapeAndContiguousStrides() that assumed contiguity
Added FusionKernelRuntime::updateContiguityOfSegmentOutputs() to update TensorView contiguity information based on inferred tensor strides
Introduced EnableOption::InferContiguity feature flag to incrementally enable this behavior (currently enabled by default in test setup, but disabled for specific failing tests)
Modified MatmulOp::evaluate() to avoid forcing contiguous strides when InferContiguity is enabled
Renamed inferOutputShapeAndContiguousStrides to inferContiguousOutputMetaTensor for clarity

The implementation leverages PyTorch's meta device to compute shapes/strides without materializing actual tensors. The PR description mentions that some ATen ops' meta device implementations are Python-based and can hang when called from C++ (due to GIL acquisition issues), requiring manual shape/stride computation using at::empty_strided(device=meta) for those cases.

Confidence Score: 4/5

This PR is safe to merge with moderate confidence, using a feature flag for gradual rollout
The implementation is well-structured with a feature flag allowing incremental enablement. The core logic for meta device execution is sound, and the PR includes appropriate test coverage including the original failing case from issue FusionKernelRuntime::getMaybeHeuristicsFor computes the wrong strides. #4888. Score is 4 (not 5) because: (1) some tests are explicitly disabled for InferContiguity, indicating known compatibility issues that need future fixes; (2) the PR description mentions potential GIL-related hangs with certain ATen ops on meta device, though mitigations are in place; (3) this is a behavioral change affecting stride computation that could have subtle effects on downstream code
No files require special attention - the implementation is clean and well-structured

Important Files Changed

Filename	Overview
csrc/runtime/fusion_kernel_runtime.cpp	Added `inferOutputMetaTensor` and `updateContiguityOfSegmentOutputs` methods to infer output shapes/strides using meta device execution for ExprEval segments, replacing `inferOutputShapeAndContiguousStrides` calls
csrc/runtime/fusion_kernel_runtime.h	Added declarations for `inferOutputMetaTensor` and `updateContiguityOfSegmentOutputs` methods
csrc/options.h	Added `InferContiguity` enable option to control the new contiguity inference feature
csrc/ir/composite_nodes.cpp	Modified `MatmulOp::evaluate` to conditionally apply stride adjustment only when `InferContiguity` is disabled, allowing natural ATen output strides when enabled
tests/cpp/utils.cpp	Enabled `InferContiguity` by default in test setup
tests/python/direct/test_python_frontend.py	Added `test_issue4888` to verify the fix for incorrect stride computation with explicit `InferContiguity` enablement

Sequence Diagram

sequenceDiagram
    participant User
    participant FusionKernelRuntime
    participant prepareInputs/getMaybeHeuristicsFor
    participant inferOutputMetaTensor
    participant ExpressionEvaluator
    participant ATen as ATen (Meta Device)
    participant updateContiguityOfSegmentOutputs
    participant TensorView

    User->>FusionKernelRuntime: runWithInputs(args)
    FusionKernelRuntime->>prepareInputs/getMaybeHeuristicsFor: Prepare segment inputs
    
    loop For each segment
        prepareInputs/getMaybeHeuristicsFor->>inferOutputMetaTensor: Infer output shape/stride
        
        alt is_expr_eval && InferContiguity enabled
            inferOutputMetaTensor->>ExpressionEvaluator: Create ExpressionEvaluator
            loop For each input
                inferOutputMetaTensor->>ATen: at::empty_strided(sizes, strides, device=meta)
                ATen-->>inferOutputMetaTensor: meta tensor
                inferOutputMetaTensor->>ExpressionEvaluator: bind(input, meta_tensor)
            end
            loop For each output
                ExpressionEvaluator->>ATen: evaluate() - run ATen ops on meta device
                ATen-->>ExpressionEvaluator: result meta tensor with actual strides
                ExpressionEvaluator-->>inferOutputMetaTensor: result
            end
        else not expr_eval or InferContiguity disabled
            inferOutputMetaTensor->>inferOutputMetaTensor: inferContiguousOutputMetaTensor()
            Note right of inferOutputMetaTensor: Assumes contiguous output
        end
        
        inferOutputMetaTensor-->>prepareInputs/getMaybeHeuristicsFor: group_runtime_outputs
        
        prepareInputs/getMaybeHeuristicsFor->>updateContiguityOfSegmentOutputs: Update TensorView contiguity
        
        alt InferContiguity enabled
            loop For each output TensorView
                updateContiguityOfSegmentOutputs->>TensorView: ir_utils::resetContiguityFromTensor(tv, tensor)
                Note right of TensorView: Updates contiguity info from actual tensor strides
            end
        end
        
        updateContiguityOfSegmentOutputs-->>prepareInputs/getMaybeHeuristicsFor: done
    end
    
    prepareInputs/getMaybeHeuristicsFor-->>FusionKernelRuntime: all_runtime_inputs prepared
    FusionKernelRuntime->>FusionKernelRuntime: Execute segments with correct stride info

greptile-apps

_{20 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

csrc/runtime/fusion_kernel_runtime.cpp

tests/python/test_python_frontend.py

csrc/options.cpp

wujingyue · 2026-01-15T20:19:44Z

csrc/runtime/fusion_kernel_runtime.cpp

-    auto fusion_to_run = segmented_fusion_->makeFusion(group_to_run).second;
-    auto group_runtime_outputs = inferOutputShapeAndContiguousStrides(
-        fusion_to_run.get(), group_runtime_inputs);
+    auto group_runtime_outputs = inferOutputMetaTensor(


I'm losing track of the code. group_runtime_inputs contain meta tensors or real tensors at this moment? The setDeviceIndex call seems to say they are real tensors.

IIUC in prepareInputs, group_runtime_inputs contains real tensor (but still, inferOutputShapeAndContiguousStrides returns meta tensor), but in getMaybeHeuristicsFor, group_runtime_inputs contains meta tensor.

Got it. Should setDeviceIndex at line 419 be removed? Is it safe or necessary? (I don't think your PR changes the situation; just OOC).

Co-authored-by: Jingyue Wu <[email protected]>

zasdfgbnm · 2026-01-15T22:14:00Z

!test

zasdfgbnm · 2026-01-16T04:49:30Z

!test

zasdfgbnm · 2026-01-16T06:06:44Z

!test

zasdfgbnm · 2026-01-16T07:02:49Z

!test

zasdfgbnm · 2026-01-16T07:37:34Z

!test

zasdfgbnm · 2026-01-16T19:31:53Z

!test

wujingyue · 2026-01-16T20:01:38Z

csrc/runtime/fusion_kernel_runtime.cpp

-    auto fusion_to_run = segmented_fusion_->makeFusion(group_to_run).second;
-    auto group_runtime_outputs = inferOutputShapeAndContiguousStrides(
-        fusion_to_run.get(), group_runtime_inputs);
+    auto group_runtime_outputs = inferOutputMetaTensor(


Got it. Should setDeviceIndex at line 419 be removed? Is it safe or necessary? (I don't think your PR changes the situation; just OOC).

wujingyue · 2026-01-16T20:07:03Z

csrc/runtime/fusion_kernel_runtime.cpp

    args_manager.updateWithSegmentOutputs(
        group_to_run->outputs(), group_runtime_outputs, run_order_id);
+
+    updateContiguityOfSegmentOutputs(group_to_run, group_runtime_outputs);


Is this to hide some bugs in mark_aliases_prepare or allocation_order_inference? The TensorViews in the complete fusion and therefore in segments ought to be correct after preseg.

How do you define "hide a bug"? We need the correct continuity eventually, which is only possible after we know the scheduler of segmentation. So, why isn't this just writing the correct information, instead of hiding a bug?

which is only possible after we know the scheduler of segmentation

But scheduling happens after prepareInputs:

Fuser/csrc/runtime/fusion_kernel_runtime.cpp

Line 431 in 352dcbf

compileKernel(group_runtime_inputs, group_to_run);

I'm probably missing some important details that are so obvious to you. Let me try to remove this line and see where things break...

$ _bn && pytest tests/python/direct/test_python_frontend.py -k test_issue4888 -vs

passes with the following patch

diff --git a/csrc/runtime/fusion_kernel_runtime.cpp b/csrc/runtime/fusion_kernel_runtime.cpp index e025d29d..132cba82 100644 --- a/csrc/runtime/fusion_kernel_runtime.cpp +++ b/csrc/runtime/fusion_kernel_runtime.cpp @@ -427,8 +427,6 @@ std::vector<KernelArgumentHolder> FusionKernelRuntime::prepareInputs( // map output args to tensor map args_manager.updateWithSegmentOutputs( group_to_run->outputs(), group_runtime_outputs, run_order_id); - - updateContiguityOfSegmentOutputs(group_to_run, group_runtime_outputs); } return all_runtime_inputs;

But let me try other tests as well...

I missed the other call to updateContiguityOfSegmentOutputs. After removing that, I see SegmentationTest.RevertPrivatizedUpcast fails. Let me try to understand the error...

$ bin/test_nvfuser --gtest_filter=SegmentationTest.RevertPrivatizedUpcast Running main() from /opt/pytorch/nvfuser/third_party/googletest/googletest/src/gtest_main.cc Note: Google Test filter = SegmentationTest.RevertPrivatizedUpcast [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from SegmentationTest [ RUN ] SegmentationTest.RevertPrivatizedUpcast /opt/pytorch/nvfuser/tests/cpp/test_segmentation.cpp:855: Failure Expected equality of these values: num_upcast_ops Which is: 1 2 To reproduce: NVFUSER_TEST_RANDOM_SEED=1768609993 NVFUSER_TEST_ATEN_RANDOM_SEED=0 test_nvfuser --gtest_filter='SegmentationTest.RevertPrivatizedUpcast' [ FAILED ] SegmentationTest.RevertPrivatizedUpcast (218 ms) [----------] 1 test from SegmentationTest (218 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (218 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] SegmentationTest.RevertPrivatizedUpcast 1 FAILED TEST

Use meta device tensor to infer contiguity for expr-eval segments

d828c9f

zasdfgbnm and others added 13 commits January 13, 2026 09:52

Merge branch 'resetContiguityFromTensor' into meta-eval

074e947

Merge branch 'resetContiguityFromTensor' into meta-eval

4ad3785

save

7ffdaa3

save

7a5b0dc

save

d92e5ee

save

074209b

save

68d52fb

save

fb0572a

save

08e5b6b

enable

784ce68

Merge branch 'meta-eval' of github.com:NVIDIA/Fuser into meta-eval

9c46183

save

93a4012

fix

e5d4d67

zasdfgbnm and others added 7 commits January 14, 2026 12:14

save

38defa9

save

f50e52f

fix

5fd7496

fix

40782db

save

53d70fe

save

87b00e8

fix

4afe5b1

zasdfgbnm added 3 commits January 14, 2026 13:37

fix

447de33

Merge branch 'meta-eval' of github.com:NVIDIA/Fuser into meta-eval

dd41424

save

831c777

save

d246fc6

save

01a011c

save

03fa1f5

zasdfgbnm mentioned this pull request Jan 15, 2026

Add utility ir_utils::resetContiguityFromTensor #5766

Open

zasdfgbnm marked this pull request as ready for review January 15, 2026 18:49

zasdfgbnm requested review from naoyam and wujingyue January 15, 2026 18:49

greptile-apps bot reviewed Jan 15, 2026

View reviewed changes

csrc/runtime/fusion_kernel_runtime.cpp Outdated Show resolved Hide resolved

wujingyue reviewed Jan 15, 2026

View reviewed changes

zasdfgbnm and others added 4 commits January 15, 2026 14:01

Apply suggestion from @wujingyue

0a2878b

Co-authored-by: Jingyue Wu <[email protected]>

Merge branch 'resetContiguityFromTensor' into meta-eval

72c66dd

save

b151b50

save

4249f8b

zasdfgbnm requested a review from wujingyue January 15, 2026 22:56

fix

40b5411

Merge branch 'resetContiguityFromTensor' into meta-eval

c81f895

wujingyue reviewed Jan 16, 2026

View reviewed changes

Use meta device tensor to infer contiguity for expr-eval segments #5772

Are you sure you want to change the base?

Use meta device tensor to infer contiguity for expr-eval segments #5772

Conversation

zasdfgbnm commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Uh oh!

zasdfgbnm commented Jan 14, 2026

Uh oh!

zasdfgbnm commented Jan 14, 2026

Uh oh!

zasdfgbnm commented Jan 14, 2026

Uh oh!

zasdfgbnm commented Jan 14, 2026

Uh oh!

zasdfgbnm commented Jan 14, 2026

Uh oh!

zasdfgbnm commented Jan 15, 2026

Uh oh!

zasdfgbnm commented Jan 15, 2026

Uh oh!

greptile-apps bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm commented Jan 15, 2026

Uh oh!

zasdfgbnm commented Jan 16, 2026

Uh oh!

zasdfgbnm commented Jan 16, 2026

Uh oh!

zasdfgbnm commented Jan 16, 2026

Uh oh!

zasdfgbnm commented Jan 16, 2026

Uh oh!

zasdfgbnm commented Jan 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zasdfgbnm commented Jan 7, 2026 •

edited

Loading

github-actions bot commented Jan 7, 2026 •

edited

Loading

greptile-apps bot commented Jan 15, 2026 •

edited

Loading